0%

(CVPR 2018) Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Keyword [Semantic Embedding] [Transfer Knowledge] [GCN]

Wang X, Ye Y, Gupta A. Zero-shot recognition via semantic embeddings and knowledge graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6857-6866.



1. Overview


1.1. Motivation

two paradigms of transferring knowledge

  • use implicit knowledge representation (semantic embedding)
  • use explicit knowledge bases or knowledge graph

In this paper

  • based on Graph Convolutional Network (GCN)
  • predict visual classifier for each category
  • use both (imexplicit) semantic embeddings and the (explicit) categorical relationships to predict the classifier
  • Zero-Shot Learning
    • attribute
    • semantic embeddings
    • knowledge graph



2. Methods


2.1. GCN




  • A [n x n]. normalized, binary adjacency matrix of graph
  • X [n x k]. feature matrix
  • W [k x c]. weight matrix
  • Z [n x c]. output
  • n. the number of category; node of hte graph
  • ReLU

2.1.1. Training Time



  • use first m entities
  • X = {x_1, x_2, …, x_n}, n entities embedding
  • Y = {y_1, y_2, …, y_n}
  • y_i ∈ {1, …, C}
  • C. the number of labels

2.1.2. Testing Time

  • use n-m entities

2.2. GCN for Zero-Shot Learning

  • Input. set of category’s embedding vector



  • Output. visual classifier for each input category (node)



  • visual feature. extract by fixed pre-trained net, dimension D

  • classifier. dimension D for each node
  • 6-layer GCN

A direct way. input x_i, output w_i based on m training pairs, but m is small.

2.2.1. Loss function



  • ground-truth classifier weights learned from training images

2.3. Details

  • LeakyReLU (0.2) leads to faster convergence
  • L2-Normalized classifier is important
  • find the last layer classifiers of the ImageNet pre-trained networks are naturally normalized



3. Experiments


3.1. Dataset

  • relationships and graph (common sense knowledge rules) from Never-Ending Language Learning (NELL)
  • images from Never-Ending Image Learning (NEIL)
  • construct a new knowledge graph based on NELL and NEIL (1.7M object entities, 2.4M edges)
  • use Breadth-first search (BFS), maximum length 7 hops


3.2. Ablation Study

3.2.1. Baseline



  • more performance gain as our graph size increases

3.2.2. Missing Edge



  • knowledge graph chave redundant information with 14k nodes and 97k edges connecting them

3.2.3. Random Graph



3.2.4. Depth of GCN



  • optimization becomes harder as the network goes deeper

3.2.5. Differences between Word Embeding and Classifier



3.2.6. Is Word Embedding Methods Crucial



3.3. Comparison